-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column transformer for segmented data #9
Conversation
The main use case for this transformer is to enable the application of specified groups of feature functions to specified columns of data, e.g. when dealing with heterogeneous data. The SegmentedColumnTransformer is derived from the sklearn ColumnTransformer and adapted to be used inside a Pype object after a segment transformation. The adaption mainly consists of: - adapt the notation of a column (ColumnTransformer iterates over the second dimension, segmented data must be iterated over the third dimension). - disable "drop" and "passthrough" transform options for simplicity and drop non-specified columns by default Note: SegmentedColumnTransformer does not support contextual data.
Pull Request Test Coverage Report for Build 124
💛 - Coveralls |
Thank you Matthias for your work on this. If I understand your aim correctly - you want to have the API support specifying which time series variables each feature is computed for? It is true that currently the API supports only feature representations where each feature is computed for all variables. I agree that this is a limitation of the current code. I would be happy to include this capability, but it needs to work with TS_Data. Did you look at FeatureRep.transform? Should be pretty easy to merge the context data back with the feature data using np.column stack. Also maybe pick a better name eg FeatureRepMix since columns makes sense primarily in the context of 2D data. Also we should implement f_labels so that the user can retrieve the mapping of features post transform. It would be nice if the unit testing checked the calculation, not just the returned data shape. Let me know if you need any help with this. Once you are done, please submit the pull request on the dev branch so I can thoroughly test it before releasing it to master. Thanks again |
Hi David,
Yes, that is correct. Furthermore I wanted to have the same functionality that ColumnTransformer offers: Parallel processing of transformers not only restricted to the FeatureRep (hence the naming). I will have a look on whether I can integrate the TS_Data into the SegmentedColumnTransformer or go with your proposal of implementing a FeatureRepMix for only applying different FeatureRep transforms on different time series variables. In that case we could use the sklearn ColumnTransformer on the outcome of the FeatureRep transformers. Cheers, |
Thanks Matthias, I think that would be a great addition to seglearn. I think you can use (inherit) sklearn ColumnTransformer to do the processing on the time series data as you did in your previous pull request. I was just suggesting you call the seglearn class implementation like FeatureRepMix to avoid confusion. The context data doesn't need to go to ColumnTransformer, so the implementation would look like the current feature rep
good luck |
Hi David,
I wrote a simple wrapper to use the sklearn ColumnTransformer on segmented data which is kind of useful when dealing with heterogeneous (multivariate) time series data.
I've taken a look into supporting contextual data but did not find an easy way to make the current code work with the TS_Data class. Maybe copying and adapting the whole ColumnTransformer code instead of patching some parts of it could lead to a proper solution to support both.
Nevertheless, I hope you find the SegmentedColumnTransformer to be useful.
Cheers,
Matthias